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ABSTRACT 


Data  farming  combines  the  rapid  prototyping  capability  of  certain  simulation  models  with  the  exploratory 
power  of  high  performance  computing  to  rapidly  generate  insight  into  questions.  The  aim  is  to  develop  a 
better  understanding  of  landscapes  of  possibilities  as  well  as  outliers  that  may  be  discovered  through 
simulation  experiments.  In  this  paper  we  will  provide  an  overview  of  the  overall  data  farming  process  as  well 
as  discuss  methods  and  techniques  that  are  used  within  the  process.  These  methods  include  the  application 
of  design  of  experiments  to  computational  experiments  in  an  iterative  process  of  team-based  rapid  model 
prototyping,  optimized  statistical  sampling  of  the  experimental  design  space,  high  performance  computing, 
multi-dimensional  analysis  and  visualization,  and  tools  and  interfaces  for  executing  these  actions. 

After  the  concept  of  data  farming  was  put  forth  in  1997,  the  United  States  Marine  Corps  utilized  the 
techniques  in  their  Project  Albert.  This  project  focused  on  questions  that  were  fundamental  to  decision 
makers,  but  could  not  be  answered  through  traditional  methods.  It  relied  on  the  combination  of  small 
simulation  models,  high  performance  computing,  and  data  farming.  During  the  project,  which  existed  from 
1998  to  2006,  an  international  community  of  interest  developed  around  the  topic  of  data  farming.  Multi¬ 
disciplinary  teams  of  researchers,  military  officers,  and  subject  matter  experts  have  been  using  the  techniques 
in  collaborative  environments  since  the  first  international  workshop  in  1999  and  have  continued  since  the 
project  ended. 
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Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 
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DATA  FARMING  IN  SUPPORT  OF  MILITARY  DECISION  MAKERS 


A  great  deal  of  sharing  and  knowledge  transfer  takes  place  at  these  International  Data  Farming  Workshops 
(IDFWs).  In  the  past  year,  IDFW  17  was  held  in  Garmisch-Partenkirchen,  Germany  and  IDFW  18  was  held 
in  Monterey,  California,  USA.  At  the  Naval  Postgraduate  School  in  Monterey,  much  continued  work  takes 
place  at  the  SEED  (Simulation  Experiments  and  Efficient  Designs)  Center  for  Data  Farming.  And  much 
work  has  taken  place  in  other  NATO  and  PfP  countries.  In  our  paper,  we  describe  these  and  other  continuing 
data  farming  efforts  around  the  world,  including  the  exploratory  work  on  data  farming  support  to  NATO  by 
NMSG  ET-029. 

We  also  describe  various  efforts  to  develop  simulation  models  and  apply  them  within  data  farming 
environments  in  support  of  military  decision  makers.  One  such  model  is  the  agent-based  model  PAX, 
developed  by  EADS  Germany  on  behalf  of  the  German  Bundeswehr.  PAX  has  successfully  been  applied 
during  the  international  workshops  since  2002.  In  that  context,  PAX  has  been  used  in  concert  with  data 
farming,  allowing  the  possibility  of  performing  thousands,  even  millions  of  simulation  runs  on  high 
performance  computers.  PAX  focuses  on  analyses  dealing  with  Peace  Support  Operations  (PSO)  and, 
recently,  operations  in  support  of  Humanitarian  Assistance. 

1.0  DATA  FARMING 

Imagination  is  more  important  than  knowledge.  Knowledge  is  limited.  Imagination  encircles  the  world. 

— Albert  Einstein 

Data  farming  combines  the  rapid  prototyping  capability  of  certain  simulation  models  with  the  exploratory 
power  of  high  performance  computing  to  rapidly  generate  insight  into  questions.  The  aim  is  to  develop  a 
better  understanding  of  landscapes  of  possibilities  as  well  as  outliers  that  may  be  discovered  through 
simulation  experiments.  Data  farming  focuses  on  a  more  complete  landscape  of  possible  system  responses, 
rather  than  attempting  to  pinpoint  an  answer.  This  “big  picture”  solution  landscape  is  an  invaluable  aid  to  the 
decision  maker  in  light  of  the  complex  nature  of  scenarios  that  NATO  forces  are  faced  with  in  today’s 
uncertain  world.  Data  farming  allows  the  decision  maker  to  more  fully  understand  the  landscape  of 
possibilities  and  thereby  make  more  informed  decisions.  Data  farming  also  allows  for  the  discovery  of 
outliers  that  may  lead  to  insightful  findings. 

In  the  past,  data  farming  has  been  used  to  seek  insight  into  questions  such  as: 

•  What  is  the  role  of  trust,  or  other  so-called  ‘intangibles’,  on  the  battlefield? 

•  What  impact  will  net-centric  warfare  and  complete  information  sharing  have  on  the  effectiveness  of 
military  units? 

•  How  can  we  best  protect  our  homeland  from  a  martyr-based  offense? 

•  How  can  a  bio-terrorist  attack  be  mitigated  in  a  free  society? 

•  What  system  characteristics  are  important  in  military  convoy  protection  systems? 

•  What  factors  are  most  important  in  defeating  improvised  explosive  devices? 

Of  course,  there  are  many  other  questions  which  are  of  interest,  and  these  are  but  a  few  of  the  ones  that  teams 
have  attempted  to  address  using  data  farming. 

Data  farming  is  an  iterative  team  process  (Horne  and  Meyer  2004).  Figure  1  presents  the  data  farming 
process  as  a  set  of  imbedded  loops.  This  process  normally  requires  input  and  participation  by  subject  matter 
experts,  modellers,  analysts,  and  decision-makers. 
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Data  Farming  Loop 

Figure  1 :  Iterative  Data  Farming  Process 


After  the  concept  of  data  farming  was  put  forth  in  1997  (Horne,  1997),  the  United  States  Marine  Corps 
utilized  the  techniques  in  their  Project  Albert.  This  project  focused  on  questions  that  were  fundamental  to 
decision  makers,  but  could  not  be  answered  through  traditional  methods.  It  relied  on  the  combination  of 
small  simulation  models,  high  performance  computing,  and  data  farming.  During  the  project,  which  existed 
from  1998  to  2006,  an  international  community  of  interest  developed  around  the  topic  of  data  farming. 
Multi-disciplinary  teams  of  researchers,  military  officers,  and  subject  matter  experts  have  been  using  the 
techniques  in  collaborative  environments  since  the  first  international  workshop  in  1999  and  have  continued 
since  the  project  ended. 

A  great  deal  of  sharing  and  knowledge  transfer  takes  place  at  these  International  Data  Farming  Workshops 
(IDFWs).  Results  have  been  documented  in  the  proceeding  from  workshops  13  through  18  (Home  and 
Meyer,  2006,  2007a,  2007b,  2008a,  2008b,  2009)  In  the  past  year,  IDFW  17  was  held  in  Garmisch- 
Partenkirchen,  Germany  and  IDFW  18  was  held  in  Monterey,  California,  USA.  At  the  Naval  Postgraduate 
School  in  Monterey,  much  continued  work  takes  place  at  the  SEED  (Simulation  Experiments  and  Efficient 
Designs)  Center  for  Data  Farming.  And  much  work  has  taken  place  in  other  NATO  and  PfP  countries. 
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NMSG  ET-029  has  been  chartered  to  explore  the  possibilities  in  developing  a  task  group  to  assess  the  data 
farming  capabilities  that  NATO,  PfP,  and  Contact  Countries,  schools,  and  agencies  have  that  could 
contribute  to  the  development  of  improved  decision  support  to  NATO  forces.  Proof-of-concept  explorations 
involving  questions  and  models  of  interest  to  NATO  nations  also  will  be  undertaken  if  the  task  group  is 
approved.  This  group  would  then  take  the  results  of  both  the  assessment  and  explorations  to  recommend  and 
demonstrate  a  way  forward  for  implementing  data  farming  methods  and  processes  in  NATO  modelling  and 
simulation  contexts.  Harnessing  the  power  of  data  farming  to  apply  it  to  our  questions  is  essential  to 
providing  support  to  NATO  decision-makers  not  currently  available.  This  support  is  critically  needed  in 
answering  questions  inherent  in  the  scenarios  we  expect  to  confront  in  the  future.  Concurrently  there  is  a 
crucial  need  to  assess  the  capabilities  of  NATO,  PfP,  and  Contact  Countries,  schools,  and  agencies  that  could 
contribute  to  the  development  of  the  science  underlying  data  farming. 

The  simulations  available  to  NATO  analysts  are  often  large  and  complex.  And  even  the  smaller  more 
abstract  agent-based  models  can  have  many  parameters  that  are  potentially  significant  and  that  could  take  on 
many  values.  In  addition,  response  surfaces  can  be  highly  non-linear.  Thus  efficient  experimental  designs 
and  other  methods  have  been  employed  in  the  data  farming  process  to  begin  to  get  at  many  of  the  questions 
that  were  previously  intractable.  We  will  describe  some  of  our  efforts  in  this  area  in  the  next  section. 


2.0  DESIGNING  SIMULATION  EXPERIMENTS 

Political,  social  and  economic  programs  are  usually  more  valuable  than  conventional  military  operations  in 
addressing  the  root  causes  of  conflict  and  undermining  an  insurgency. 

— FM  3-24  Counterinsurgency 

The  U.S.  military  uses  models  for  course  of  action  analysis,  training  and  rehearsal,  and  evaluation  for 
acquisition  (Comittee  on  Organization  Modeling,  2008).  These  models  may  lack  utility — and  are  potentially 
harmful — if  they  do  not  adequately  reflect  contemporary  operations.  Yet  as  conflicts  around  the  world  shift 
from  conventional  warfare  toward  irregular  warfare,  civilian  populations  are  often  the  determinants  of 
success.  Consequently,  interest  has  progressively  grown  in  the  development  of  models  that  can  simulate 
social  behavior  as  it  pertains  to  military  operations.  To  date  there  has  not  been  a  validated  model  designed 
for  irregular  warfare  that  covers  the  instruments  of  national  power:  Diplomatic,  Information,  Military, 
Economic  (DIME)  or  the  Political,  Military,  Economic,  Social,  Infrastructure,  and  Information  (PMESII) 
indicators  on  which  progress  in  irregular  warfare  is  based  (Marlin,  2009).  According  to  the  U.S.  Defense 
Modeling  and  Simulation  Analysis  Committee,  the  data  to  instantiate  such  a  model  is  either  nonexistent  or 
woefully  inaccurate,  and  the  validation  process  of  such  a  model  would  have  to  be  completely  rethought 
(Committee  on  Modeling  and  Simulation  for  Defense  Transformation,  2006). 

Yet  despite  the  dearth  of  data  and  lack  of  validated  models,  it  is  our  belief  that  many  in  the  defense 
simulation  community  can  dramatically  improve  their  analyses  by  using  design  of  experiments  (DOE) 
developed  specifically  for  exploring  complex  computer  models.  This,  in  turn,  can  provide  valuable  input  to 
decision-makers.  As  mentioned  earlier,  data  farming  refers  to  using  high  performance  computation  to  grow 
data.  The  harvested  data  can  then  be  analyzed  using  data  mining  or  other  statistical  techniques. 
Experimental  design  is  key  to  successful  data  farming  because  it  specifies  how  to  grow  the  data.  What  you 
reap  from  the  data  you  grow  depends  on  how  effectively  you  design  your  experiments. 
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2.1  Potential  Goals 


All  models  are  wrong,  but  some  are  useful 
— George  Box  (1979) 


We  believe  that  the  following  goals  are  appropriate  for  studies  of  complex  simulations — such  as  those 
involving  models  of  irregular  warfare,  peace  support  operations,  and  defense  against  terror. 


(i)  Developing  a  basic  understanding  of  a  particular  model  or  system.  That  is,  seeking  insights  into  the 
high-dimensional  space  by  identifying  dominant  factors,  significant  interactions,  and  finding  regions,  ranges, 
and  thresholds  where  interesting  things  happen.  In  the  context  of  simulating  irregular  warfare  and  defense 
against  terror,  we  are  typically  interested  in  gaining  insights  about  potential  futures,  and  data  may  not  exist. 

(ii)  Finding  robust  decisions,  tactics,  or  strategies.  In  other  words,  identifying  settings  for  decisions,  tactics, 
or  strategies  that  tend  to  lead  to  good  outcomes — despite  the  presence  of  uncontrollable  uncertainties.  Since 
many  aspects  of  social  dynamics  and  irregular  warfare  are  uncertain  (and  may  even  be  difficult  to  measure), 
robustness  is  particularly  important  in  these  domains  of  M&S. 

(iii)  Comparing  the  merits  of  various  decisions  or  policies.  Simulation  experiments  can  provide  valuable 
insights  to  decision  makers  by  demonstrating  that  one  simulated  alternative  is  better  than  another,  or  by 
separating  alternatives  into  those  that  appear  promising  and  those  that  can  be  eliminated  from  further 
consideration. 


We  remark  that  this  requires  a  change  of  mindset  for  those  used  to  viewing  simulation  experiments  as  a 
means  of  obtaining  accurate  predictions  at  untried  inputs,  optimizing  a  function  of  the  simulation  inputs,  or 
calibrating  the  simulation  results  to  match  physical  data;  these  three  goals  were  espoused  by  Sacks  et  al. 
(1989)  in  their  classic  paper  on  computer  experiments,  but  we  argue  that  they  are  often  not  appropriate  for 
simulations  of  conflict  or  other  complex  phenomena.  For  example,  we  use  the  term  insights  rather  than 
predictions  because  it  is  almost  always  impossible  (due  to  a  dearth  of  data)  for  us  to  provide  any  warranty  on 
the  accuracy  of  the  predictions  for  potential  future  occurrences  (Hodges,  1991).  We  seek  robust  solutions 
rather  than  optimal  solutions  in  part  because  the  sheer  number,  uncertainty,  and  complexity  of  uncontrollable 
factors  usually  makes  optimization  infeasible  without  fixing  many  uncertain  variables.  Similarly,  calibration 
is  often  dubious  because  there  is  a  shortage  of  reliable  data  for  many  of  the  situations  that  we  wish  to 
simulate.  In  some  situations,  such  as  disaster  relief  efforts  or  large-scale  terrorist  attacks,  we  are  thankful  that 
these  events  are  not  every-day  occurrences,  and  hope  that  data  continue  to  be  scarce.  An  expanded 
discussion  on  this  new  view  on  the  goals  of  many  analyses  using  complex  simulations  can  be  found  in 
Kleijnen  et  al.  (2005). 


2.2  Choice  of  Design 

[We  seek  designs  that]  allow  one  to  fit  a  variety  of  models  and  provide  information  about  all  portions  of  the 

experimental  region — Santner  et  al.  (2003) 

The  selection  of  an  experimental  design  for  simulation  experiments  depends,  of  course,  on  many  things. 
Among  the  primary  considerations  are  the  number  of  samples  that  can  be  taken,  the  number  and  levels  of 
factors  we  desire  to  vary,  any  a  priori  assumptions  on  the  response,  and  the  types  of  models  (e.g.,  regression 
or  graphical  models)  that  we  would  like  to  be  able  to  fit  using  the  response  data.  Fortunately,  a  rich  number 
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of  designs  have  been  developed  that  are  appropriate  for  different  experimental  settings. 

Physical  experiments  are  typically  applied  in  settings  when  there  are  not  too  many  factors  being  explored 
and  the  response  surfaces  are  fairly  simple;  e.g.,  the  response  in  the  region  of  interest  is  a  reasonably  smooth 
(e.g.,  a  linear  or  quadratic)  function  of  the  input  factors.  Despite  their  ubiquity,  these  may  not  be  the  most 
appropriate  designs  for  simulation  experiments  because  of  the  complexity  of  the  model;  they  also  fail  to  take 
into  consideration  the  additional  degree  of  control  that  a  simulation  analyst  has  over  someone  conducting 
live  experiments.  More  recently,  designs  have  been  (and  continue  to  be)  developed  for  situations  in  which 
there  are  many  variables  requiring  investigation,  and  the  relationships  between  the  inputs  and  the  outputs  are 
more  complex.  These  are  the  designs  that  our  experience  suggests  are  most  appropriate  for  extracting 
information  in  irregular  warfare  and  counter -terror  studies.  In  these  situations  there  are  usually  multiple 
responses  of  interest  and  little  a  priori  knowledge  about  the  forms  the  response  function  may  take.  It  thus 
seems  prudent  to  adopt  Santner  et  al.’s  (2003)  principle  of  selecting  designs  that  “allow  one  to  fit  a  variety  of 
models  and  provide  information  about  all  portions  of  the  experimental  region.”  While  it  is  almost  always 
impractical  to  perform  physical  experiments  on  more  than  a  handful  of  factors  in  physical  experiments,  it  is 
feasible — with  the  right  software  and  hardware — to  vary  scores  or  hundreds,  or  even  more,  factors  in 
computational  experiments.  Indeed,  facilitating  this  is  what  the  SEED  Center  for  Data  Farming  is  all  about. 

An  in-depth  discussion  of  the  types  of  designs  is  beyond  the  scope  of  this  paper.  Instead,  we  refer  the  reader 
to  Sanchez  and  Wan  (2009)  for  details  about  a  variety  of  designs,  as  well  as  a  flowchart  and  table  to  assist 
analysts  in  choosing  a  design.  However,  of  the  multitude  of  designs  we  have  used,  in  scores  of  simulations 
studies,  the  closest  to  an  all-purpose  class  of  designs  when  the  factors  of  interest  are  mostly  continuous  and 
there  is  considerable  a  priori  uncertainty  about  the  response  are  based  on  Latin  hypercube  (LH)  sampling. 
When  constructing  LH  design  matrices,  the  input  variables  are  treated  as  random  variables  with  known 
distributions.  For  each  input  variable  x,-,  “all  portions  of  its  distribution  [are]  represented  by  input  values”  by 
dividing  its  range  into  “n  strata  of  equal  marginal  probability  1/n,  and  [sampling]  once  from  each  stratum,” 
(McKay  et  al.,  1979).  For  ease  of  generation  and  to  provide  good  space -filling,  we  usually  sample  from  a 
discrete  uniform  distribution.  That  is,  for  each  x„  the  n  equally-spaced  input  values  are  assigned  at  random 
to  the  n  cases.  This  generates  column  x,  in  the  design  matrix,  and  is  done  independently  for  each  of  the  k 
input  variables. 

We  have  found  LHs  to  be  good  all-purpose  designs  for  several  reasons:  (i)  design  flexibility :  We  can 
readily  generate  an  LH  for  most  any  combination  of  continuous  factors  and  sampling  budget.  Indeed,  if  n  is 
large,  simple  rounding  enables  us  to  generate  reasonable  designs  for  just  about  any  combination  of  number 
of  factors  and  number  of  levels;  (ii)  space-filling :  LHs  sample  throughout  the  experimental  region — not  just 
at  corner  points.  Specifically,  if  we  look  at  any  group  of  factors  we  will  find  a  variety  of  combinations  of 
levels;  and  (iii)  analysis  flexibility:  The  resultant  output  data  allow  us  to  fit  many  different  models  to 
multiple  MOEs.  In  particular,  these  designs  permit  us  to  simultaneously  screen  many  factors  for  significance 
and  fit  very  complex  meta-models  to  a  handful  of  dominant  variables.  This  flexibility  also  extends  to  visual 
investigations  of  the  data  as  we  get  many  cameras  on  the  relationships  between  inputs  and  outputs. 

Randomly  generated  LHs  have  been  used  for  many  studies  over  the  years.  For  any  given  combination  of 
sampling  budget  ( n )  and  factors  (k),  there  are  (n!)<k  11  possible  LH  designs  generated  from  discrete  uniforms, 
as  specified  above.  Rather  than  select  one  of  these  at  random,  we  prefer  to  use  a  design  matrix  whose 
columns  are  orthogonal  (or  nearly  orthogonal)  and  that  has  good  space -filling  properties;  see  Cioppa  and 
Lucas  (2007).  A  spreadsheet  containing  several  variations  of  these  “good”  LHs  for  designs  involving  up  to 
29  factors  can  be  downloaded  from  the  SEED  Center’s  website  (http :  / /harvest .  nps  .  edu).  This 
spreadsheet  is  regularly  updated  as  new  designs  become  available. 
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There  are  many  situations  where  designs  other  than  LHs  are  more  appropriate.  This  is  especially  true  if  we 
have  goals  such  as  factor  screening  or  are  willing  to  make  certain  assumptions  (e.g.,  the  response  is  linear  in 
the  region  of  interest)  about  the  relationships  between  inputs  and  outputs.  Our  goal  is  to  provide  simulation 
experimenters  with  a  portfolio  of  readily  available  designs  for  use  in  high-dimensional  explorations.  For 
example,  fractional  factorial  and  central  composite  designs  have  been  cornerstones  in  industrial  and 
laboratory  experimentation  for  decades  (Montgomery,  2005).  These  designs  allow  experimenters  to  estimate 
main  effects  (linear  and  perhaps  quadratic)  and  lower  order  (e.g.,  two-way)  interactions.  In  most  cases, 
however,  these  experiments  involve  a  relatively  modest  set  of  factors.  Consequently,  almost  all  readily 
available  resolution  V  designs  are  for  experimental  situations  involving  less  than  about  a  dozen  or  so  factors; 
see,  for  example,  NIST/SEMATECH  (2009). 

To  enable  analysts  to  easily  generate  large  resolution  V  fractional  factorial  and  central  composite  designs,  an 
algorithm  utilizing  a  fast  Walsh  transformation  (Sanchez  and  Sanchez,  2005)  is  available  on  the  Center’s 
website.  This  algorithm  has  been  used  to  generate  designs  as  big  as  2443  423,  and  catalog  them  with  a  simple 
list  of  Walsh  indices  rather  than  extensive  tables  of  confounding  patterns.  While  220  (roughly  a  million)  runs 
is  a  lot,  we  have  taken  even  larger  samples  in  some  of  our  simulation  studies  (Vinyard  and  Lucas,  2002). 
Furthermore,  220is  a  lot  less  than  2443. 

Other  approaches  that  can  be  useful  for  computational  experiments  include  sequential  screening  methods. 
Many  simulations  contain  a  large  number  of  input  factors,  of  which  only  a  small  proportion  have  noteworthy 
effects.  In  such  cases,  one  is  often  interested  in  identifying  those  significant  factors — perhaps  so  that 
additional  experimental  efforts  can  focus  on  them.  Recently,  several  sequential  methods  for  simulation 
screening  have  been  proposed.  First,  Bettonvil  and  Kleijnen  (1997)  proposed  a  method  called  sequential 
bifurcation  (SB)  that  iteratively  screens  groups  of  factors  until  each  input  factor  is  classified  as  either 
significant  or  not.  They  show  that  this  approach  is  very  efficient  when  the  directions  of  effects  are  known 
and  there  are  only  a  small  proportion  of  critical  effects.  Wan  et  al.  (2006,  2009)  improve  on  SB,  with  a 
method  they  call  controlled  sequential  bifurcation  (CSB),  by  allowing  the  user  to  specify  limits  on  both  type 
1  errors  and  power  for  all  effects.  Building  off  of  Wan’s  et  al.’s  CSB  designs,  hybrid  two-phase  approach 
that  combines  efficient  fractional  factorial  experiments  with  CSB.  These  new  designs  have  proven  robust  to 
some  of  the  strong  assumptions  required  in  SB  methods  (such  as  being  able  to  specify  the  direction  of 
effects)  and  they  are  surprisingly  effective  and  efficient;  see  Sanchez  et  al.  (2009),  Shen  et  al.  (2009). 

2.3  Benefits  of  Designed  Experiments 

What  may  not  be  obvious  at  first  glance  is  how  quickly  it  becomes  computationally  infeasible  to  conduct  full 
factorial  designs  on  individual  factors — even  for  a  relatively  simple  simulation  model.  For  example,  suppose 
that  the  simulation  has  twenty  factors  that  can  be  varied,  the  scenario  has  ten  factors,  and  there  are  four 
manipulations.  A  full  factorial  design  involving  only  low  and  high  levels  (i.e.,  no  intermediate  levels)  for 
each  factor  for  this  single  organization  has  220  x  2 10  x  4  runs  per  replication.  Even  if  the  computational 
model  runs  in  one  second,  this  requires  136  years  of  computer  processing  time!  And,  unless  the  response 
variances  can  reasonably  be  assumed  to  be  constant,  two  or  more  replications  are  needed  in  order  to  estimate 
the  effects  with  any  statistical  validity.  Given  that  there  are  hundreds  or  thousands  of  factors  in  many  models 
of  military  operations,  it  is  easy  to  see  why  practitioners  unaware  of  the  power  of  experimental  design  have 
limited  their  studies  to  a  small  number  of  factors  or  groups  of  factors.  However,  this  also  severely  limits  the 
types  of  insights  that  can  be  gleaned  from  a  single  simulation  study. 
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3.0  SEED  CENTER  FOR  DATA  FARMING 


Once  you  have  invested  the  effort  to  build  (and  perhaps  verify,  validate  and  accredit)  a  simulation  model, 
it’s  time  to  let  the  model  work  for  you!  -Lucas  and  Sanchez,  2009 


Complex  simulations  will  continue  to  be  an  increasingly  important  tool  for  those  charged  with  informing 
decision  makers  in  defense,  including  irregular  warfare  and  counter -terror  operations.  It  is  our  belief  that 
much  more  information  can  be  extracted  from  these  simulations  by  combining  relatively  inexpensive,  high- 
performance  computing;  a  new  mindset  on  the  information  gleaned  from  many  simulations;  and  DOE  tools 
for  high-dimensional  simulation  exploration.  Of  course,  the  vast  majority  of  simulation  practitioners  are  not 
experts  in  designing  simulation  experiments.  Thus,  we  must  make  doing  so  easy  for  them.  For  this  purpose 
we  have  created  the  SEED  Center  for  Data  Farming,  and  are  working  to  strengthen  ties  in  the  international 
defense  modeling  and  simulation  communities. 

The  Center  provides  many  resources  online  at  http :  / /harvest .  nps  .  edu,  including  links  to  student 
theses  that  use  a  variety  of  designs  on  a  diverse  set  of  models  and  studies.  Indeed,  SEED  Center  techniques 
have  been  used  to  investigate  issues  relating  to  the  following  areas:  Fighting  the  global  war  on  terrorism, 
convoy  operations,  peacekeeping,  non-lethal  weapons,  urban  combat,  unmanned  (air,  ground,  sea,  and 
subsurface)  vehicles,  logistics  in  support  of  urban  humanitarian  assistance,  future  networked  enabled  forces 
(such  as  the  Future  Combat  System  FCS,  Marine  Distributed  Operations  (DO),  and  the  Future  Force  warrior 
(FCS)) — and  many  more.  These  applications  are  a  good  source  for  finding  information  on  the  output 
analysis  techniques  we  have  found  useful  for  exploring,  large  sample,  high-dimensional,  simulation  output 
data. 

Spreadsheets  and  software  for  generating  the  designs  are  also  available  from  the  SEED  Center’s  web  pages, 
along  with  publications  that  provide  details  about  the  methodological  advances  and  a  variety  of  applications. 
Finally,  the  SEED  web  pages  have  links  to  other  resources,  including  information  about  the  bi-annual 
International  Data  Farming  Workshops.  The  next  IDFW  is  scheduled  for  November  2009  in  Auckland,  New 
Zealand. 
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4.0  SIMULATION  MODELS  -  TWO  EXAMPLES 

This  section  describes  two  efforts  to  develop  simulation  models  with  different  scopes  and  strengths 
exemplarily.  The  first  agent-based  model  named  PAX  (Latin  word  for  “peace”)  was  developed  by  EADS 
Germany  with  the  goal  to  model  human  factors  and  has  successfully  been  used  during  the  International  Data 
Farming  Workshops  since  2002.  The  second  model  named  ABSEM  (agent-based  sensor-effector  model),  is 
also  being  developed  by  EADS  Germany  and  focuses  on  a  physics  based  model  of  sensors  and  shooters.  This 
tool  has  been  used  during  the  IDFW  since  spring  2008.  Both  models  allow  the  possibility  of  performing 
thousands,  even  millions  of  simulation  runs  on  high  performance  computers  in  data  farming  analyses.  They 
have  been  developed  on  behalf  of  the  German  Bundeswehr.  It  is  planned  to  merge  the  two  models  due  to 
their  basic  compatibility  on  model  and  representation  level  to  get  a  data  farmable  model  with  strengths  in  the 
areas  of  human  behaviour  representation  as  well  as  a  complex  physics  model  and  combinations  thereof. 

4.1  The  agent-based  model  PAX 

PAX  models  human  factors  with  the  focus  on  the  behavior  and  interaction  of  both  military  personnel  and 
civilians.  The  modeling  approach  includes  the  process  of  human  decision-making,  the  psychological  aspects 
regarding  the  evolving  aggressiveness  of  non-military  entities,  and  a  detailed  representation  of  individuals 
and  groups  consisting  of  civilians  as  well  as  soldiers.  The  dynamics  of  emotional  states  such  as  fear  and 
anger  and  their  impact  on  the  behavior  of  actors  as  well  as  stress  are  also  modeled.  PAX  allows  analyses  of 
Peace  Support  Operations  (PSO),  operations  in  support  of  Elumanitarian  Assistance  and  enables 
investigations  in  the  context  of  irregular  warfare  (IW). 

PAX  agents  basically  represent  individuals.  The  civilian  agent  model  contains  3  basic  processes  which 
determine  the  individual  level  of  aggression.  Needs  and  emotions  influence  the  level  of  aggression  and  a 
process  of  de -individuation  determine  the  contribution  of  personal  norms  of  anti-aggression.  "Need", 
"anger",  "fear"  and  the  "readiness  for  aggression"  are  modeled  as  state  variables.  The  value  of  these 
variables  will  change  dynamically,  depending  on  events  in  the  agents’  environment  (e.g.  actions  of  other 
agents).  De-individuation  describes  a  state  of  the  agent  in  which  he  considers  himself  as  part  of  a  crowd  and 
not  as  an  individual.  It  is  the  basic  concept  that  allows  to  model  groups  of  agents  and  group  cohesiveness. 
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Figure  2:  The  (simplified)  core 
of  a  civilian  agent  in  the  model  PAX 
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Figure  2  depicts  the  civilian  agent  model.  The  civilian  agent's  behavior  mainly  depends  on  his  motivation. 
Besides  fear  and  anger,  there  is  (among  others)  a  basic  motive  to  satisfy  a  certain  need,  e.g.  to  get  a  food 
package.  Motives  are  correlated  each  to  a  specific  goal  and  compete  against  each  other.  The  motive  with  the 
highest  intensity  at  a  particular  point  of  time  is  selected  as  action-leading  motive  and  predetermines  the  main 
direction  of  the  behavior. 

The  soldier  agents  in  PAX  can  be  assigned  different  tasks,  such  as  guarding  the  entrance  to  a  camp  or 
supporting  the  distribution  of  food  packages,  for  example.  In  case  of  aggressive  actions  of  certain  civilians, 
the  soldiers  may  react  with  pacifying,  threatening  or  defensive  actions  towards  these  civilians.  The  soldiers' 
RoE  can  be  implemented  by  respective  rule  sets  that  describe  the  agents’  goals  and  personalities.  The 
following  is  a  simple  example  for  a  mainly  de-escalating  rule  set  description:  If  civilians  act  aggressively, 
soldiers  try  to  calm  down  the  situation  by  always  trying  to  pacify  the  civilians,  no  matter  what  the  situation  is 
like.  Threatening  or  defensive  actions  are  never  chosen. 

PAX  agents  interact  with  each  other.  A  soldier  agent’s  actions  such  as  pacifying,  threatening  or  defending 
are,  in  a  first  step,  assessed  by  the  other  agent  (event  assessment).  Depending  on  the  type  of  event  and  the 
agent's  personality,  the  cognitive  assessment  may  evoke  fear,  anger  and  arousal,  and  possibly  cause  stress. 

Understanding  the  complex  dynamics  of  human  behavior  and  interaction  is  extremely  important  for  PSO. 
The  type  of  challenges  coming  along  with  this  kind  of  operations  might  best  be  described  by  an  Example: 
Recent  war  activity  and  relocation  of  civilians  to  refugee  camps  have  given  way  to  a  new  set  of  challenges 
for  military  forces.  The  challenges  faced  by  military  personnel  stationed  in  a  refugee  zone  are  quite  different 
from  those  in  a  classic  conflict  situation  such  as  a  war  zone.  For  example,  military  forces  stationed  in  a 
refugee  camp  are  usually  not  confronted  with  a  heavily  armed  enemy,  but  with  hungry,  scared,  or  in  some 
cases  enraged  civilians.  The  involvement  of  military 
forces  requires  the  understanding  of  the  given  situation  and 
contextual  behaviors  of  the  people  involved  in  the 
situation.  In  the  case  of  the  refugee  scenario,  it  is 
necessary  to  understand  how  civilians  will  react  within  the 
camps.  Important  questions  to  understand  include:  Will 
the  civilians  in  the  refugee  camp  remain  peaceful  or  will 
they  become  aggressive?  Should  the  soldiers  keep  a  low 
profile  or  take  strong  actions  to  maintain  peace  in  the 
camp?  What  level  of  involvement  is  necessary  to  de- 
escalate  a  situation  created  by  an  enraged  civilian?  Are  the 
current  rules  of  engagement  adequate  or  should  they  be 
modified  to  better  address  issues  that  may  arise  in  a 
refugee  camp?  The  answers  to  these  questions  are 
required  for  the  operational  and  tactical  aspects  of  military 
involvement  in  such  a  refugee  scenario. 

PAX  studies  attempt  to  analyze  these  kind  of  questions  using  the  described  human  factors  and  behaviour 
modeling.  The  main  application  of  PAX  is  to  study  and  potentially  modify  current  Rules  of  Engagements 
(ROE)  for  specific  scenarios  in  the  context  of  PSO.  Another  matter  subject  to  investigations  with  PAX  is  to 
better  understand  situations  of  interest  and  how  they  could  evolve  with  the  goal  of  identifying  indicators  by 
which  potential  escalations  might  be  recognized  in  an  early  state.  Examples  for  examined  scenarios  are 
operating  refugee  camps,  securing  elections  or  crowd  and  riot  control. 


Figure  3:  Refugee  Camp  in  PAX  3.0  and  PAX  3D 
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4.2  Agent  based  sensor  effector  model  (ABSEM) 

Analyzing  the  advantages  of  networked  sensors  and  effectors  for  military  capabilities  in  network  centric 
operations  (NCO)  becomes  more  and  more  important  in  times  of  asymmetric  warfare.  Former  studies 
showed,  however,  that  existing  agent  based  models  are  rather  limited  in  terms  of  modeling  and  simulating 
complex  technical  systems  on  a  sound  physical  basis.  For  this  reason  a  new  agent  based  model  that  aims  to 
fulfill  the  requirements  to  be  used  for  analyzing  the  combination  of  various  sensor  and  effector  systems  in 
NCO  and  taking  into  account  underlying  physical  theories  was  developed.  Therefore  objectives  for  modeling 
complex  technical  systems  were  defined:  examining  the  performance  and  value  of  existing  sensors  and 
effectors  -  those  under  development  and  future  ones  during  military  operations  -, 
analyzing  the  combination  of  various  sensor  and  effector  systems  in  NCO,  support 
simulation  of  different  levels  of  abstraction  and  the  possibility  to  use  the  model 
within  the  data  farming  process  (as  distillation  model)  as  well  as  within  distributed 
simulation  networks  (higher  level  of  detail). 

A  first  prototype  of  ABSEM  was  presented  to  the  International  Data  Farming 
Community  at  IDFW  17  (Figure  4). 

ABSEM  is  an  agent  based  realistic  and  dynamic  simulation  with  dynamically  and  individually  interacting 
entities  and  the  possibility  to  consider  non-linearities  and  intangibles,  so  that  the  highly  dynamic  character  of 
multi-party  scenarios  can  be  captured.  The  driving  force  behind  agents'  behavior  is  list  of  goals  and  tasks, 
e.g.  motion  tasks  (patrol,  move  to,  follow,  escape),  battle  tasks  (attack,  defend),  communication  tasks  and 
transportation  tasks.  The  allocation  of  tasks  is  initially  (when  setting  up  the  scenario),  dynamically  (at 
runtime,  as  a  reaction  to  outside  influences  or  as  an  result  from  an  internal  decision  processes)  and  state- 
oriented  (defining  what  has  to  be  achieved,  but  not  how). 

In  addition  to  a  distillation  model  for  data  farming,  that  is  significantly  faster  than  real  time,  ABSEM 
supports  a  high  resolution  model  for  detailed  analyses  in  a  federation  of  networked  simulation  systems.  This 
implies  the  necessity  to  stay  close  to  RPR-FOM  where  the  agents'  "public"  state  (entity  type,  force  identifier, 
damage  state,  spatial...)  is  described.  In  the  high  resolution  mode  ABSEM  is  able  to  be  part  of  and  to 
provide  computer  generated  forces  for  a  VIntEL  experiment  (see  publication:  Distributed  Integrated  Testbed 
in  these  procedures).  In  addition  ABSEM  is  used  to  prepare  VIntEL  experiments  by  data  farming  the 
relevant  parameters  and  setups  as  well  as  post  process  the  high  resolution  results  via  data  farming. 

For  modeling  sensors  in  ABSEM  a  detailed  physical  approach  is  used  in  contrast  to  the  probability  based 
approach  which  is  used  in  previous  agent  based  simulations.  So  far  normal  human  viewing,  residual  light 
amplifiers  (short  waved  IR),  thermal  camera  (middle  and  long  waved  IR  -  Figure  5)  and  radar  sensors  are 
modeled.  The  ABSEM  sensor  input  consists  of  background  information,  e.g.  (temperature,  contrast),  target 
information  (temperature,  signature)  as  well  as  atmospheric  conditions  and  weather  history.  The  ABSEM 

sensor  output  is  a  list  of  perceived  entities  in  the  sensor's  field  of  view  /  field 
of  search,  along  with  the  attribute  “detected”,  “classified”  or  “identified”. 

For  effectors  in  ABSEM  the  goal  is  to  determine  the  point/area  of  impact  as 
well  as  the  damage  done  to  the  target.  Two  types  of  weapons  are 
distinguished:  weapons  with  point  impact  (e.g.  pistols,  rifles)  and  weapons 
with  area  impact  (e.g.  grenades,  mines).  The  hit  probability  and  kill 
probability  depends  on  the  weapon  system,  the  environmental  parameters 
and  the  target  characteristics.  ABSEM  differentiates  between  the  six 
damage  levels:  no  effect,  mobility  kill,  firepower  kill,  mission  kill,  communication  kill  and  catastrophic  kill. 
ABSEM  as  a  data  farming  model  makes  an  idealization  and  stochastic  modeling  necessary  to  some  extent  to 
simplify  very  complex  processes.  Consequences  are  a  parabolic  idealization  of  ballistic  trajectory,  an  impact 
point  which  depends  on  dispersion  and  no  detailed  physical  attrition  modeling. 


Figure  5:  Infrared  view 
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Figure  6:  ABSEM  camp  protection  scenario 


During  the  last  three  IDFWs  a  camp 
protection  scenario  was  used  to  validate  and 
test  the  performance  of  ABSEM  (see 
Figure  6).  During  these  experiments,  the 
sensor  performance  of  different  sensors  with 
the  measures  detection/  classification/ 
identification  times  and  distances  under 
varying  conditions  (day  and  night,  different 
atmospheric  conditions  and  seasons  -  Figure 
7)  was  analyzed.  Additionally  the 
performance  of  different  weapons  was 
examined,  varying  the  marksman  dispersion 
and  recording  caused  damages. 


Figure  7:  ABSEM  weather:  fog,  day,  night 

As  results  the  advantages  of  different  sensor  types  for  given  scenarios  and  conditions  can  be  evaluated  by  the 
defined  measures  of  effect  (MOE)  like  destroyed  vehicles  in  a  patrol  convoy  (Figure  8).  The  important 
factors  for  the  MOE  can  be  evaluated  with  regression  trees  as  shown  in  Figure  9. 
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Figure  8:  MOE  over  different  sensor  types 
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Figure  9:  Regression  tree 
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